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Performance standards for detector 
systems often include requirements for 
probability of detection and probability 
of false alarm at a specified level of 
statistical confidence. This paper reviews 
the accepted definitions of confidence 
level and of critical value. It describes the 
testing requirements for establishing 
either of these probabilities at a desired 
confidence level. These requirements 
are computable in terms of functions 
that are readily avail-able in statistical 
software packages and general spreadsheet 
applications. The statistical interpretations 
of the critical values are discussed. A table 
is included for illustration, and a plot is 
presented showing the minimum required 
numbers of pass-fail tests. The results 
given here are applicable to one-sided 



testing of any system with performance 
characteristics conforming to a binomial 
distribution. 
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1. Introduction 

In evaluating the efficacy of equipment that is meant 
for detection of hidden contraband or dangerous sub- 
stances, the instrument is often subjected to testing that 
measures its performance against requirements set forth 
in protocols set by national or international standards 
organizations. Performance requirements in these stan- 
dards include those for probability of detection (PD) and 
probability of false alarm (PFA) at a specified level of 
statistical confidence. 

The detection systems considered in this paper are all 
assumed to behave according to a binomial distribution. 
Only two outcomes are considered for independent 
trials with contraband present: the detection system either 
correctly reports detection or does not. Furthermore, 
the probability of detection must remain constant during 
the period of the testing. Otherwise, it may be meaning- 



less to perform binomial model based tests to determine 
estimates of this quantity. Similarly, for tests with contra- 
band absent, the detection system either correctly reports 
no detection, or it falsely reports the presence of contra- 
band: and the probability of a false alarm is presumed to 
remain fixed throughout the period of testing. 

For a detection system, PD or PFA can only be deter- 
mined accurately by a sufficient number of trials. 
However, there is a number called the confidence level 
(CL) that gives some sense of adequacy of the results 
from a series of trials of a given size. 

CL is defined in terms of the binomial probability 
mass function, also called the binomial discrete density 
function, b(m; n,p), 



b{m\ n, p) = Pr(BTN(n, p) = m) 

P m (i- P y 



(1) 



m 



\(n-m)\ 
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where m = 0,1, . . . , n, denotes the number of success- 
ful detections or false alarms) in n independent trials 
with/> = PD, oxp = PFA, <p < 1 (see Johnson, Kotz, 
and Kemp, 1992.) The number of successes in n 
repeated independent trials conforms to this function if 
each trial can be scored as either success or failure and 
the probability for success is fixed. 

In Sec. 2 we discuss the definitions of CL and relat- 
ed critical values in detection problems. Section 3 gives 
statistical interpretation of these values in terms of 
hypothesis testing and confidence bounds. The note is 
concluded with Sec. 4 containing some examples. 



It follows that m c is well defined only if BINCDF 
(n-l 9 n,PD c )>CL 9 i.e., if 



\-pd: >CL. 



(6) 



Since BINCDF(x, n,p) is a step-function in x (i.e., is 
not strictly increasing), it does not have a proper 
inverse function. If we set m c -\ 9 \<m c <n to be 
the least integer such that BINCDF(m c - 1, n, PD C ) 
exceeds CL, then 



m c = INVBINCDF(CL,n,PD c ) +1 , 



(J) 



2. Definitions and Test Requirements 

The quantity CL can be loosely interpreted as the 
likelihood that any such system conforming to a bi- 
nomial distribution with m successes in a series of n 
independent trials will have a true PD value greater or 
equal to a chosen value, PD C . 

More formally, the accepted definition of CL in 
setting testing requirements is stated in terms of the 
equation below. The usage of this term is consonant 
with that of ASTM standard C 1236-99 (2005). 

For a number m of successes found in a series of n 
pass-fail trials, with a fixed value of PD, designated 
PD C , the confidence level CL(m, n 9 PD C ) is defined by 
the equation 



m i 

CL(m,n,PD c ) = ^b(j;n,PD c ). 



(2) 



7=0 



In other words, if for x = 0, 1, . . . , n 9 <p < 1, 



BINCDF(x, n, p) = ZV(BIN(«, p) < x) 



=IC)p k o-p) 



(3) 



n-k 



denotes the binomial cumulative distribution function, 
then (2) can be expressed as 

CL(m, n, PD c ) = BINCDF( m-l 9 n, PD c ) . (4) 

Note that under this definition CL(m, n,PD c ) cannot 
exceed \-PD n c . 

To find the critical value m c , i.e., the minimum value 
of m establishing the PD C of interest with a preselected, 
fixed level of confidence, CL, one must invert the 
inequality, 



BINCDF(m c -1, h, PD c ) > CL. 



(5) 



where INVBINCDF(CL, n, p) is the inverse cumulative 
binomial distribution function (i.e., is the smallest non- 
negative integer such that the cumulative distribution 
function evaluated at this value equals or exceeds CL.) 
Versions of this function are available in many statisti- 
cal software packages, including MATLAB (binoinv), 
R (qbinom), NAG, GAMS, IMSL, S-PLUS, and SAS 
and in general spreadsheet applications, such as 
EXCEL (function CRITBINOM{n,p, CL).) 1 

The binomial cumulative distribution function can 
be expressed through the incomplete beta- function, 



BTNCDF(m-l,n,p) =1 -I p (m,n-m+l) 

\ l x m -\\-x) n - m cbc 
_ l_p 

f Q X m - l (\-x) n - m dx' 



(8) 



m>0, n-m + 1 >0, (Abramowitz and Stegun, 1972), 
so that for fixed m and «, BINCDF(m - 1, n,p) is a 
decreasing function ofp, 0<p<\. This formula allows 
one to define BINCDF(m - 1, n,p) for any real (non- 
integer) values m and n such that < m < n + 1 . 

An analogous definition of CL applies to testing for 
PFA in systems where no contraband or dangerous sub- 
stance is present. For any chosen value of PFA, desig- 
nated PFA C , the confidence level CL(m,n,PFA c ), 
equals the probability that the number of false alarms 
occurring in a series of n independent binary trials 
exceeds m. Thus, this level is defined by the equation 



CL = CL(m,n,PFA c )= £ b(k;n,PFA c ) 

k=m+l 

= l-BTNCDF(m,n,PFA c ). 



(9) 



Any mention of specific commercially available statistical soft- 
ware packages or general spreadsheet applications does not imply 
endorsement of preference for these products by the NIST. 
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Similarly to the PD case, 

CL<\-(\-PFA c )\ 



(9) 



To find the maximum value M c of M, M = 0, 1 , . . . , 
n - 1 , establishing the PFA C of interest with a preselect- 
ed, fixed level of confidence CL, one must invert the 
inequality 



\-BWOyF(M c ,n,PFA^ >CL. 



(11) 



To express M c through the function INVBINCDF 
(c, n,p), i.e., to establish the largest value m satisfying 
(11), the formula, 



(12) 



INVBINCDF(c, n, p) = n-\ 

-max{x:BINCDF(x,«,l-^) <1 -c}, 

can be employed. To prove (12), notice that for 
x = 0, . . . , n — 1, 



(13) 



BINCDF(x , n , p) = 

l-BNCDF(n-x-l 9 n,l-p), 
so that 

77-l-INVBINCDF(c, #i, p) 

= H-l-min{x:BINCDF(x, n, p) >c) 

= «-l-min{x:BINCDF(«-x-l, n, 1-/?) <1 -c} 

= max{x: BINCDF(x, n, 1 -p) <1 -c} . 

Therefore, 

M c = «-l-INVBINCDF(c, n, 1 -PFA C ) , (15) 
so that M c < n - 1 and M c is not defined when 

INVBINCDF(CL,n,\-PFA c ) =n, 

i.e., when (1 -PFA c ) n > \ -CL. 

Thus (15) and (7) show that under the same value of 
CL, when PD = 1- PFA, a simple formula, 

(16) 



m c +M c =n, 



relates m r and M r . 



3. Hypothesis Testing and Confidence 
Bounds on Binomial Probability 

We give here two statistical interpretations of Eq. (7) 
and Eq. (15). The first of these is related to a (lower) 
cconfidence limit for binomial probability/?. Such limits 
are supposed to provide a data-dependent interval 



containing the unknown p with a given probability 
called confidence coefficient (see Hahn and Meeker, 
1991). 

Assume that for the given CL, a lower confidence 
bound for PD = p of confidence coefficient CL is 
desired: that is for a binomial observation X~BIN(n,p), 
one requires a function/? =p(X,n, CL) such that 



Pr(p(X,n,CL)<p)>CL. 



(17) 



The well known solution of this problem forX> 1, is 
p{X, n,CL) = 

max{/?:BINCDF(X-l , n, p) > CL}. 

(18) 
(e.g, Casella and Berger, 2002.) When X=0, 
p(0,n,CL) = 0. 

Thus with m c defined by (7), the inequalities/? <p 
(strict inequality) andX< m c (non-strict inequality) are 
equivalent. Therefore, the critical value m c has the 
interpretation of the largest value of the binomial 
BIN(n,p) variable such that the lower confidence 
bound for p does not exceed PD C . 

A related interpretation is provided by the statistical 
hypothesis testing problem, H :p> PD C under the 
alternative: H 1 :p<PD c . The most powerful test of 
level 1 - CL rejects H when the observed value X 
exceeds the critical value m, X> m (which means the 
same as/?(X, «, CL)>PD C ). 

The critical value for PFA has a similar statistical 
interpretation, namely, M c is the largest value of the 
binomial variable for which the upper confidence 
bound for the binomial probability does not exceed 
PFA C . Indeed, an upper confidence bound of confi- 
dence coefficient CL has the form, 



p(X,n,CL) = \-p(n-X, n, CL) . 

Identity (13) shows that 
p(X,n,CL) = 

min{/?:BINCDF(X, n, p) <1 -CL }. 

Thus, p(M c ,n,CL)<PFA c , 

but p (M c +1, ii, CL)>PFA c . 



(19) 



(20) 



In terms of the hypothesis testing with H : p< PFA C 
and the alternative: H { :p>PFA c , the most powerful 
test of level 1-CL rejects H when the observed value X 
exceeds the critical value M c , X> M c . 
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4. Examples 

Consider an example in which one finds twenty-nine 
correct results in a single set of thirty trials. If the 
system under test conforms to a binomial distribution, 
then based on the result of twenty-nine out of thirty 
correct responses in that one set of tests, one can make 
multiple correct inferences, such as: the PD > 0.95 
with 44 %, confidence, the PD > 0.90 with 81 %, confi- 
dence, or the PD > 0.85 with 95 % confidence. 

One can easily construct a table which simul- 
taneously includes requirements for both PD and PFA. 

Table 1 gives the critical value M c and n-m c for 
68 % confidence to show the general characteristics of 
these quantities. These are the maximum permissible 
numbers of incorrect results that may be tolerated in 
establishing the specified PD or PFA values at this level 
of confidence. If the tabulated value is indicated as 
"*", then the number of trials in that set is insufficient 
to establish the corresponding PD or PFA at this confi- 
dence level. One may generate tables of this kind for 
any CL, PD, and PFA using Eq. (7) and Eq. (15) by 
using the previously mentioned functions like binoinv 
or CRITBINOM from statistical software packages or 
spreadsheet applications. The actual value of M c and 
n - m c given by these functions in the cases marked by 
"*"is-l. 

The symmetry of testing requirements when 
PFA = 1 - PD permits tabulating the results for PFA 
and PD in a single table, but it does not imply that PFA 
should or must always be chosen equal to 1 - PD. The 
PD and PFA values may be assigned independently in 
any testing protocol. In fact, to avoid disruption of the 
stream of commerce by large numbers of false alarms, 
it is often necessary to require inspection equipment to 
have PFA smaller than 1 - PD. 

By solving (6) or (10), we obtain a formula for the 
minimum number of required trials n k needed to estab- 
lish a given value of PD or PFA for the same CX, 



Table 1. Maximum permissible numbers of incorrect results for 
verifying a lower bound on PD or an upper bound on PFA with 
68 % confidence 



with 



nfc=\d\. 



log(l-CZ)_ log(l-CZ) 



log PD \og{\- PFA) 



(21) 



(22) 



Here [a] denotes the smallest integer exceeding a. 
This formula is useful in designing test protocols that 
give the most satisfactory requirement with the least 
amount of testing. Figure 1 shows a plotted as a func- 
tion of PD and CL. This function increases much 
more rapidly for PD approaching 1 than for CZ, — » 1. 
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Similarly n k in (21) would increase much more rapidly 
for PFA -» than for CL -» 1. 

When only the minimum number of trials n k is per- 
formed, the system must give 1 00 % correct results to 
establish the specified PD or PFA at, the desired confi- 
dence CL. In statistical terms, n k is the smallest number 
of trials with 1 00 % correct detections such that the 
CL-lower confidence bound for detection probability 
exceeds the given value PD. The same is true when 
there are no false alarms with the CX-upper confidence 
bound on the false alarm probability being less than 
PFA. A table such as Table 1 will show how many 
errors may be permitted if a larger number of trials are 
carried out, while still establishing the specified PD or 
PFA at the desired CL. 
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Fig. 1. The minimum required number of tests to establish a given value of PD (or 1-PFA) for a given CL. 



5, Discussion and Conclusions 

The formula for n k shows that requiring either PD or 
CL to be too near unity can result in impossibly large 
numbers of pass-fail tests. If such rigorous criteria are 
in fact required then one should search for some 
method of verification different from pass-fail testing. 

The results presented here make it possible to design 
pass-fail testing protocols based on functions readily 
available in statistical software packages and general 
spreadsheet applications. 
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